An Exploratory Study of the Malay Text Processing Tools in Ontology Learning

نویسندگان

  • Mohd Zakree Ahmad Nazri
  • Siti Mariyam Shamsudin
  • Azuraliza Abu Bakar
چکیده

This paper discusses the overall process of learning taxonomy from Malay texts using unsupervised conceptual clustering approach and investigates the existing Malay NLP tools as potential pre-processing tools for the proposed ontology learning approach. The tools are a maximum-entropy parser based on open NLP package, a word sense tagger and a parser based on pola grammar. A case study approach is adopted in this study deemed suitable for exploratory research. The result for each NLP tool shows a lower recall and precision. The poor result is caused by several factors such as the texts being used in this experiment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

A Hybrid Approach for Learning Concept Hierarchy from Malay Text Using GAHC and Immune Network

The human immune system provides inspiration in the attempt of solving the knowledge acquisition bottleneck in developing ontology for semantic web application. In this paper, we proposed an extension to the Guided Agglomerative Hierarchical Clustering (GAHC) method that uses an Artificial Immune Network (AIN) algorithm to improve the process of automatically building and expanding the concept ...

متن کامل

Semantic Similarity Measures for Malay Sentences

The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008